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1 Introduction 

Satisfiability is one of the most popular NP-complete problems. There are 
two main types of algorithms for solving SAT, namely local search (for refer- 
ences see, for example, and DPLL-type (this type was first described in 
the work 5 of Davis and Putnam and 4 of Davis, Logemann and Loveland). 
A lot of effort has been invested in proving "less-that-2 Ar " upper bounds for 
such algorithms. In this paper we concentrate on proving exponential lower 
bounds and consider two DPLL-type algorithms: GUC (Generalized Unit 
Clause heuristic; introduced in and Randomized GUC. 

DPLL-type algorithms were historically the first "less-than-2 Ar " algo- 
rithms for SAT. They receive as input a formula F in CNF with variables 
x%, . . . , xjv- After that, a DPLL-type algorithm simplifies the input accord- 
ing to a certain set of transformation rules. If the answer now is obvious (the 
simplified formula is either empty or contains a pair of contradicting unit 
clauses), the algorithm returns an answer. In the opposite case, it chooses a 
literal I in the formula according to a certain heuristic. Then it constructs 
two formulas, one corresponding to I := true and the other to / := false, 
and recursively calls itself for these two formulas (note that since we deal 
with the running time of the algorithm, the order in which it calls itself 
for these two formulas does matter) . If any of the calls returns the answer 
"Satisfiable", the algorithm also returns this answer. Otherwise, it returns 
"Unsatisfiable" . Therefore, such algorithms differ from each other by two 
procedures: one for simplifying a formula, and the other for choosing the 
next literal. 
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Superpolynomial lower bounds for regular resolution (and hence DPLL- 
type algorithms) are known since |1U| . In ^, a probabilistic distribution 
of exponentially hard formulas was considered, and an exponential lower 
bound for two DPLL-type algorithms (GUC and UC) was proved. Sup- 
posedly, among them satisfiable instances should exist, but this question 
remains open. Exponentially hard satisfiable instances were found for lo- 
cal search algorithms (see jHj). However, no results have been published 
about exponentially hard provably satisfiable formulas for DPLL-type algo- 
rithms. In this paper we present such instances. Section 2 is devoted to 
basic definitions, in section 3 examples of hard unsatisfiable formulas are 
given, sections 4 and 5 contain proofs of lower bounds for the algorithms, 
and open questions are formulated in Section 6. 

2 Preliminaries 

We denote by X a set of boolean variables. The negation of a variable x is 
denoted by x. If U C X, then U = {x \ x € U}. Literals are members of the 
set X U X. A clause is a set of literals that does not contain simultaneously 
any variable together with its negation. A formula in CNF is a finite set 
of clauses. A clause is called unit if it consists of one literal. A literal is 
called pure with respect to a formula if the formula contains only the literal, 
but does not contain its negation. We denote by PL(F) the collection of all 
pure literals in F. 

An assignment is a finite subset I Q X U X that does not contain any 
variable together with its negation. We denote by F[I] a formula that results 
from F and an assignment / = {x±,X2, ... ,x n } after removing all clauses 
containing the literals Xi and deleting all occurrences of the literals xl from 
the other clauses. An assignment I is said to satisfy the formula F, if F[I] 
is the empty formula (that is, F[I] contains no clauses). 

For a formula F(x\, . . . , x n ) we construct its binary assignment tree. Its 
nodes are partial assignments for F consisting of literals xi,...,x n or their 
negations, and the sons of a node I = . . . , k}, where lj £ {xj, ~x]}, are 
the assignments I\ = {h, . . . , k, and I2 = {h, ■ ■ ■ , k, Following 

[2], we denote by Cf the collection of clauses in F containing exactly i 
literals (we will omit the upper index if it is clear from context). 
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3 Hard unsatisfiable formulas 



First examples of unsatisfiable formulas requiring superpolynomial time for 
regular resolution (shown in jS] to be equivalent to the Davis-Putnam proce- 
dure for complexity issues) appeared in ^Uj. Examples in that article were 
obtained by using boolean formulas based on graphs. Tseitin used rather 
simple graphs, and his bounds were improved by Galil in [Hj. In jllj . using 
the graph theory results on expanders, the bounds were improved to the 
form of 2 cN , where N is the number of variables in the formula (in the fu- 
ture c will denote that very constant). Note that the bounds proven in the 
present article depend on the best known bound for unsatisfiable formulas 
and, therefore, will automatically improve if the above-mentioned constant 
c is increased. 

Let us quote the following theorem from JJ| (here S m is an always 
existing, previously constructed in the same article formula): 

Theorem 1 5.7). There is a constant c > 1 such that for sufficiently 

large m, any resolution refutation of S m contains c n distinct clauses, where 
S m is of length 0(n),n = m 2 . 

In using a generalization of Tseitin's tautologies, the following result 
was established: for every k > 3, there exists a constant > 0, c\. = 
0(l/k 1 ^), such that every DPLL-algorithm for fc-SAT has worst-case time 
complexity at least Q(2 N ^ 1 ~ Ck ^), where ./V is the number of variables in the 
formula. 

It is also worth mentioning that formulas in [H] and have linear 
number of clauses, that is, there is a constant b such that these formulas 
have less than bN clauses, where iV is the number of variables in them. 

We denote by Gfc(yi, . . . ,Vn) the hard formula in &-CNF appearing in 
[2] with ./V variables yi, ■ ■ ■ ,Vn- 

4 Hard formulas for GUC 

The GUC algorithm is described in [JJ and its procedure for making a choice 
is shown here on Fig^ Essentially, it selects a random literal satisfying a 
clause of the smallest size. Compared to the algorithm in [2], we have added 
the pure literals rule to its choice heuristic, that is, if the negation of a literal 
does not occur in the formula, we automatically satisfy this literal. Obvi- 
ously, checking for pure literals can be done in polynomial number of steps 
(with respect to the number of variables). It is also obvious that applying the 
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Making a choice with the GUC algorithm 

Input: A formula F in CNF. 

Method: 

1. m := min{i : Cf ^ 0}. 

2. If m = 1 then choose C randomly from Cf and set I to the only 
literal in C. 

3. Else if PL(F) ^ 0, choose I randomly from PL(F). 

4. Else choose C randomly from C^, then choose I randomly from 
C. 

5. Output I. 



Figure 1: One step of the improved GUC algorithm 

pure literals rule cannot make the current partial assignment contradictory. 
Note that our bounds also hold for the original GUC algorithm (and, later, 
the original Randomized GUC algorithms instead of the modified version) ; 
in fact, the pure literals rule will sometime make our bounds worse. 

In this article we use the backtracking implementation described, for 
example, in Basically, every time an algorithm splits on some variable, 
it makes a choice, and the number of such choices in that case measures its 
efficiency. When first reaching a node, algorithm marks a choice it has to 
make as forced, if it was made by using the transfomation rules. In our case, 
such choices occur when there is either a unit clause or a pure literal in the 
formula. In the opposite case we will call a choice free. The backtracking 
implementation of GUC will "go down the assignment tree" until it finds a 
contradiction, and then backtrack to the last free choice. Then it flips the 
value assigned during this last free choice, marks this choice as forced, and 
continues. We measure the complexity of our algorithm as the number of 
choices (both free and forced) it makes until it finds a satisfying assignment. 

Let us now proceed to proving the exponential lower bound on satisfiable 
formulas. Consider the following formula (we denote by x V E the set of 
clauses obtained by adding x to all clauses in E): 

F = (xi V G k (x M +\, ■ ■ .,gri+c feAf |))A 

A (xi V X2 V X3) A (x± V X3 V xZ) A • • • A (x± V V xjj) A (xT V xm V SJ) 

Note that the second line corresponds to x\ V H, where H is a formula 
forcing the variables X2, ■ ■ ■ , xm to have equal values. Also note that while 
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Gk is a formula in £;-CNF, F is a formula in (k + 1)-CNF (and its first part 
is in (k + 1)-CNF). At the first step, GUC satisfies a random literal from 
a random clause of minimal size. With probability | this literal is x±. In 
this case, our formula becomes Gk(xM+i, ■ ■ ■ , x, i+c k -,), and the algorithm 

will have to make at least poly(M)2 M choices to eliminate all leaves of the 
assignment tree Theorem 5.7]). 

With probability |, GUC chooses another literal I to satisfy. Let I = x 2 
(it does not matter which one we choose due to symmetry). 

F[x2l = ( x i v G(xm+i, ■ ■ ■ , ffj- i+a Af -| ))A 

c k 

A (xi V X3 V acj) A ix\ V14 V X5) A • • • A (xT V xm-i V xm) A (xj V xm) 

The formula now has a 2-clause, and during the next step GUC will either, 
with probability i, satisfy xT, thus creating a hard unsatisfiable instance, or 
satisfy xm-, and we are left with 

F[x 2 ,x M ] = (xi V G(x M +i, ■ ■ ■ ,x,i+c k Af ,))A 

A (xT V X3 V X4) A (xi V X4 V X5) A • • • A (xT V XAf-i) 

Only when there are no 3-clauses left, the last remaining literal becomes 
a pure literal, and the last 2-clause is decided automatically. It follows by 
easy induction that the probability of setting x\ = false (and forcing GUC 
to work for the time poly(M)2 M ) is 

P{x x = false) = 1 - V M+3 , 

which tends to 1 exponentially fast as M tends to 00. 

If we now denote by iV the total number of variables in the formula, all 
of the above proves the following 

Theorem 2. For every k > 4 there exists a set of satisfiable formulas F^ 
in k-CNF such that the modified GUC algorithm requires to make at least 

poly(N)2 1+Ck - 1 choices to find a satisfying assignment, and F^ contains N 
variables and no more than aN clauses, where a is a constant not depending 
on N and c k = Oil/k 1 / 8 ). 

5 Hard formulas for Randomized GUC 

It might seem that we succeeded with the GUC algorithm only because of 
its highly determined behavior. The problem might be in the necessary 
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Making a choice with the Randomized GUC algorithm 

Input: A formula F in CNF. 

Method: 

1. m := min{i : Cf ^ 0}. 

2. If m = 1 then choose C randomly from Cf and set I to the only 
literal in C. 

3. Else if PL(F) ^ 0, choose I randomly from PL(F). 

4. Else choose C randomly from C^, then choose I randomly from 

cue. 

5. Output I. 



Figure 2: One step of the Randomized GUC algorithm 

satisfying a shortest clause. Our formula in the preceding section "tricks" 
GUC into the wrong subtree precisely because of this particular behavior. 
In this section, we present a hard satisfiable instance for a modification of 
the GUC algorithm, namely Randomized GUC algorithm. One step of this 
algorithm is shown on Fig|2j It chooses a literal randomly from the shortest 
clause, but also randomly chooses whether to satisfy it. For example, if the 
shortest clause is a V b, Randomized GUC could choose any literal of the set 
{a, b, a, b}. 

Randomized GUC would break the example in the preceding section. 
Indeed, on the very first step it will have a chance of ^ to set x\ = true, 
thus reducing the formula to a very simple one. Therefore, by restarting 
Randomized GUC we can achieve arbitrarily high probability of success. 

Let us consider the following formula (denoting G := Gi~(xm+1, ■ ■ ■ i x r ^kj± M -i ) 

and assuming 3 | M without loss of generality) : 

F = (xi V G) A (x 2 V G) A . . . A (x M V G)A 

A (x\ V V X3) A (x2 V X3 V Xi) A (2:3 V V £2) A 
A (X4 VX5 V Xq) A (X5 V X6 V X4) A (xq V X4 V X5) A . . . 
... A (x M -2 Vxm-i Vim) A (x M -i Vxm"Vx A /_ 2 ) A (x M -i V xJJ V x M - 2 ). 

As in the case described above, an assignment satisfies F if and only if 
it sets the variables xi,x 2 , ■ ■ ■ ,xm to true. 

The Randomized GUC algorithm will first choose a random clause among 
the shortest ones, that is, among the second part of our formula, and then 
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a random literal from the chosen clause. Since all literals appear symmet- 
rically, in fact it chooses a random literal among x±, . . . , xm,%T, ■ ■ ■ ,x~M to 
satisfy with equal probabilities. First note that if it chooses any of the nega- 
tive literals, F[xj] would contain G as an independent subformula. It would 
take exponentially long for Randomized GUC to prove its unsatisfiability, 
since a contradiction can be reached only in this subformula (it is easy to see 
that the rest of the clusters of three clauses cannot be reduced to an empty 
clause). So, with probability ^, the desired result is achieved. Suppose it 
chooses x\ (without loss of generality, because the formula is symmetrical 
with respect to the first M variables). The formula now contains two 2- 
clauses, (ccj V X3) and (X3 V^)- The algorithm now has to make a free 
choice with probability ^ of success (that is, choosing X2 or X3 rather than a 
negation). If it succeeds, it gets a unit clause on the next step and chooses 
a value for the remaining variable correctly. 

In short, every cluster of three 3-clauses with similar variables has a 
probability of j of setting the correct values for its variables, and the algo- 
rithm considers these clusters one at a time, one after another. Therefore, 
the overall probability of success is 

P(Vi : 1 < i < M Xi = true) = \^~l M . 

And in case of failure, the time Randomized GUC will require to prove the 
unsatisfiability of G is poly(M)2~s M . All of the above proves the following 

Theorem 3. For every k > 4 there exists a set of satisfiable formulas 
F]y in k-CNF such that the Randomized GUC algorithm requires to make 

2c fc-i jy 

at least poly (N)2 2+Ack - 1 choices to find a satisfying assignment, and F^ 
contains N variables and no more than aN 2 clauses, where a is a constant 
not depending on N and c& = 0(1/ k 1 / 8 ). 

6 Further work 

In this paper we proved an exponential lower bound for satisfiable formulas 
for two DPLL-type algorithms. However, "hard" formulas for the Random- 
ized GUC algorithm turned out to have quadratic relationship between the 
number of clauses and the number of variables. It would be interesting to 
construct similar linear-sized formulas. 

Also, apart from the unit clause and pure literal principles, a number 
of other heuristics is used in modern DPLL-type SAT solvers. Such heuris- 
tics include the resolution rule, "black-and-white literals" principle etc. (for 
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more information see [3]). Similar bounds are still to be proven for algo- 
rithms employing these heuristics. 
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